Overview

Dataset statistics

Number of variables24
Number of observations150622
Missing cells341
Missing cells (%)< 0.1%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory27.6 MiB
Average record size in memory192.0 B

Variable types

BOOL15
NUM8
CAT1

Reproduction

Analysis started2020-06-09 12:34:06.913424
Analysis finished2020-06-09 12:34:29.803638
Duration22.89 seconds
Versionpandas-profiling v2.8.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml

Warnings

admitdiagnosis has a high cardinality: 426 distinct values High cardinality
patientunitstayid is highly correlated with df_indexHigh correlation
df_index is highly correlated with patientunitstayidHigh correlation
df_index has unique values Unique
patientunitstayid has unique values Unique

Variables

df_index
Real number (ℝ≥0)

HIGH CORRELATION
UNIQUE

Distinct count150622
Unique (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean81908.17210633241
Minimum0
Maximum163656
Zeros1
Zeros (%)< 0.1%
Memory size1.1 MiB

Quantile statistics

Minimum0
5-th percentile8087.05
Q140954.25
median81845.5
Q3122967.75
95-th percentile155618.95
Maximum163656
Range163656
Interquartile range (IQR)82013.5

Descriptive statistics

Standard deviation47272.76369
Coefficient of variation (CV)0.5771434336
Kurtosis-1.201342145
Mean81908.17211
Median Absolute Deviation (MAD)41008.5
Skewness0.001084073896
Sum1.23371727e+10
Variance2234714187
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
20471< 0.1%
 
668261< 0.1%
 
1139491< 0.1%
 
1119001< 0.1%
 
996101< 0.1%
 
1057531< 0.1%
 
1037041< 0.1%
 
1262311< 0.1%
 
1241821< 0.1%
 
1303251< 0.1%
 
Other values (150612)150612> 99.9%
 
ValueCountFrequency (%) 
01< 0.1%
 
21< 0.1%
 
31< 0.1%
 
41< 0.1%
 
51< 0.1%
 
ValueCountFrequency (%) 
1636561< 0.1%
 
1636551< 0.1%
 
1636531< 0.1%
 
1636521< 0.1%
 
1636511< 0.1%
 

patientunitstayid
Real number (ℝ≥0)

HIGH CORRELATION
UNIQUE

Distinct count150622
Unique (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1769139.3832839825
Minimum141168
Maximum3353254
Zeros0
Zeros (%)0.0%
Memory size1.1 MiB

Quantile statistics

Minimum141168
5-th percentile229562.65
Q1969206
median1685783.5
Q32750760.75
95-th percentile3206468.55
Maximum3353254
Range3212086
Interquartile range (IQR)1781554.75

Descriptive statistics

Standard deviation986592.8281
Coefficient of variation (CV)0.5576682298
Kurtosis-1.308267745
Mean1769139.383
Median Absolute Deviation (MAD)904042.5
Skewness0.02576842912
Sum2.664713122e+11
Variance9.733654085e+11
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
28724181< 0.1%
 
24688271< 0.1%
 
31370521< 0.1%
 
10908781< 0.1%
 
31888231< 0.1%
 
23995451< 0.1%
 
11011071< 0.1%
 
16682591< 0.1%
 
15823821< 0.1%
 
26371011< 0.1%
 
Other values (150612)150612> 99.9%
 
ValueCountFrequency (%) 
1411681< 0.1%
 
1411941< 0.1%
 
1411971< 0.1%
 
1412031< 0.1%
 
1412081< 0.1%
 
ValueCountFrequency (%) 
33532541< 0.1%
 
33532511< 0.1%
 
33532351< 0.1%
 
33532261< 0.1%
 
33532161< 0.1%
 

verbal
Real number (ℝ)

Distinct count6
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.9633519671761097
Minimum-1
Maximum5
Zeros0
Zeros (%)0.0%
Memory size1.1 MiB

Quantile statistics

Minimum-1
5-th percentile1
Q14
median5
Q35
95-th percentile5
Maximum5
Range6
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.634573332
Coefficient of variation (CV)0.4124219461
Kurtosis0.2951436799
Mean3.963351967
Median Absolute Deviation (MAD)0
Skewness-1.330168977
Sum596968
Variance2.671829976
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
59498663.1%
 
12577417.1%
 
41916412.7%
 
350943.4%
 
233102.2%
 
-122941.5%
 
ValueCountFrequency (%) 
-122941.5%
 
12577417.1%
 
233102.2%
 
350943.4%
 
41916412.7%
 
ValueCountFrequency (%) 
59498663.1%
 
41916412.7%
 
350943.4%
 
233102.2%
 
12577417.1%
 

motor
Real number (ℝ)

Distinct count7
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5.415576741777429
Minimum-1
Maximum6
Zeros0
Zeros (%)0.0%
Memory size1.1 MiB

Quantile statistics

Minimum-1
5-th percentile1
Q16
median6
Q36
95-th percentile6
Maximum6
Range7
Interquartile range (IQR)0

Descriptive statistics

Standard deviation1.447528396
Coefficient of variation (CV)0.2672897948
Kurtosis7.570351288
Mean5.415576742
Median Absolute Deviation (MAD)0
Skewness-2.863280855
Sum815705
Variance2.095338458
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
611855478.7%
 
5128648.5%
 
177745.2%
 
477075.1%
 
-122941.5%
 
38950.6%
 
25340.4%
 
ValueCountFrequency (%) 
-122941.5%
 
177745.2%
 
25340.4%
 
38950.6%
 
477075.1%
 
ValueCountFrequency (%) 
611855478.7%
 
5128648.5%
 
477075.1%
 
38950.6%
 
25340.4%
 

eyes
Real number (ℝ)

Distinct count5
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.4370277914248915
Minimum-1
Maximum4
Zeros0
Zeros (%)0.0%
Memory size1.1 MiB

Quantile statistics

Minimum-1
5-th percentile1
Q13
median4
Q34
95-th percentile4
Maximum4
Range5
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.061591122
Coefficient of variation (CV)0.3088689376
Kurtosis4.084455152
Mean3.437027791
Median Absolute Deviation (MAD)0
Skewness-2.102489131
Sum517692
Variance1.126975711
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
410641870.7%
 
32248214.9%
 
1119888.0%
 
274404.9%
 
-122941.5%
 
ValueCountFrequency (%) 
-122941.5%
 
1119888.0%
 
274404.9%
 
32248214.9%
 
410641870.7%
 
ValueCountFrequency (%) 
410641870.7%
 
32248214.9%
 
274404.9%
 
1119888.0%
 
-122941.5%
 

admitdiagnosis
Categorical

HIGH CARDINALITY

Distinct count426
Unique (%)0.3%
Missing341
Missing (%)0.2%
Memory size1.1 MiB
SEPSISPULM
 
7526
AMI
 
6263
CVASTROKE
 
5800
CHF
 
5548
SEPSISUTI
 
4614
Other values (421)
120530
ValueCountFrequency (%) 
SEPSISPULM75265.0%
 
AMI62634.2%
 
CVASTROKE58003.9%
 
CHF55483.7%
 
SEPSISUTI46143.1%
 
DKA42962.9%
 
S-CABG42742.8%
 
RHYTHATR39632.6%
 
EMPHYSBRON38102.5%
 
PNEUMBACT34302.3%
 
Other values (416)10075766.9%
 

Length

Max length10
Median length9
Mean length8.102109918
Min length3
Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.1 MiB
0
148105
1
 
2517
ValueCountFrequency (%) 
014810598.3%
 
125171.7%
 

aids
Boolean

Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.1 MiB
0
150448
1
 
174
ValueCountFrequency (%) 
015044899.9%
 
11740.1%
 
Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.1 MiB
0
148137
1
 
2485
ValueCountFrequency (%) 
014813798.4%
 
124851.6%
 

lymphoma
Boolean

Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.1 MiB
0
149913
1
 
709
ValueCountFrequency (%) 
014991399.5%
 
17090.5%
 
Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.1 MiB
0
147464
1
 
3158
ValueCountFrequency (%) 
014746497.9%
 
131582.1%
 

leukemia
Boolean

Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.1 MiB
0
149495
1
 
1127
ValueCountFrequency (%) 
014949599.3%
 
111270.7%
 
Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.1 MiB
0
146464
1
 
4158
ValueCountFrequency (%) 
014646497.2%
 
141582.8%
 

cirrhosis
Boolean

Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.1 MiB
0
147746
1
 
2876
ValueCountFrequency (%) 
014774698.1%
 
128761.9%
 

activetx
Boolean

Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.1 MiB
1
88220
0
62402
ValueCountFrequency (%) 
18822058.6%
 
06240241.4%
 

ima
Boolean

Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.1 MiB
0
145861
1
 
4761
ValueCountFrequency (%) 
014586196.8%
 
147613.2%
 

midur
Boolean

Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.1 MiB
0
149205
1
 
1417
ValueCountFrequency (%) 
014920599.1%
 
114170.9%
 

ventday1
Boolean

Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.1 MiB
0
114051
1
36571
ValueCountFrequency (%) 
011405175.7%
 
13657124.3%
 
Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.1 MiB
0
100565
1
50057
ValueCountFrequency (%) 
010056566.8%
 
15005733.2%
 
Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.1 MiB
0
111335
1
39287
ValueCountFrequency (%) 
011133573.9%
 
13928726.1%
 

diabetes
Boolean

Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.1 MiB
0
114559
1
36063
ValueCountFrequency (%) 
011455976.1%
 
13606323.9%
 

creatinine
Real number (ℝ)

Distinct count1624
Unique (%)1.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.053861972354636
Minimum-1.0
Maximum24.95
Zeros0
Zeros (%)0.0%
Memory size1.1 MiB

Quantile statistics

Minimum-1
5-th percentile-1
Q10.51
median0.83
Q31.38
95-th percentile4.18
Maximum24.95
Range25.95
Interquartile range (IQR)0.87

Descriptive statistics

Standard deviation1.856314306
Coefficient of variation (CV)1.76143969
Kurtosis17.70543987
Mean1.053861972
Median Absolute Deviation (MAD)0.43
Skewness3.065964091
Sum158734.798
Variance3.445902804
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
-12933119.5%
 
0.838752.6%
 
0.737952.5%
 
0.930322.0%
 
0.629832.0%
 
1.120561.4%
 
119721.3%
 
0.518391.2%
 
1.217781.2%
 
1.315311.0%
 
Other values (1614)9843065.3%
 
ValueCountFrequency (%) 
-12933119.5%
 
0.114< 0.1%
 
0.113< 0.1%
 
0.125< 0.1%
 
0.133< 0.1%
 
ValueCountFrequency (%) 
24.951< 0.1%
 
24.61< 0.1%
 
24.31< 0.1%
 
23.91< 0.1%
 
23.871< 0.1%
 

dischargelocation
Real number (ℝ)

Distinct count7
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5.319189759796045
Minimum-1
Maximum9
Zeros0
Zeros (%)0.0%
Memory size1.1 MiB

Quantile statistics

Minimum-1
5-th percentile4
Q14
median4
Q37
95-th percentile8
Maximum9
Range10
Interquartile range (IQR)3

Descriptive statistics

Standard deviation1.852318566
Coefficient of variation (CV)0.3482332178
Kurtosis-1.07876761
Mean5.31918976
Median Absolute Deviation (MAD)0
Skewness0.7210437704
Sum801187
Variance3.431084071
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
49656664.1%
 
83019320.0%
 
7133668.9%
 
962694.2%
 
633722.2%
 
56700.4%
 
-11860.1%
 
ValueCountFrequency (%) 
-11860.1%
 
49656664.1%
 
56700.4%
 
633722.2%
 
7133668.9%
 
ValueCountFrequency (%) 
962694.2%
 
83019320.0%
 
7133668.9%
 
633722.2%
 
56700.4%
 

visitnumber
Real number (ℝ≥0)

Distinct count8
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.0633307219396901
Minimum1
Maximum8
Zeros0
Zeros (%)0.0%
Memory size1.1 MiB

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q31
95-th percentile2
Maximum8
Range7
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.2851711407
Coefficient of variation (CV)0.2681866843
Kurtosis49.70515778
Mean1.063330722
Median Absolute Deviation (MAD)0
Skewness5.837860666
Sum160161
Variance0.08132257948
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
114236094.5%
 
272504.8%
 
38240.5%
 
41380.1%
 
532< 0.1%
 
611< 0.1%
 
75< 0.1%
 
82< 0.1%
 
ValueCountFrequency (%) 
114236094.5%
 
272504.8%
 
38240.5%
 
41380.1%
 
532< 0.1%
 
ValueCountFrequency (%) 
82< 0.1%
 
75< 0.1%
 
611< 0.1%
 
532< 0.1%
 
41380.1%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

Sample

First rows

df_indexpatientunitstayidverbalmotoreyesadmitdiagnosisthrombolyticsaidshepaticfailurelymphomametastaticcancerleukemiaimmunosuppressioncirrhosisactivetximamidurventday1oobventday1oobintubday1diabetescreatininedischargelocationvisitnumber
00141168564RHYTHATR0000000010000002.3091
12141194463SEPSISUTI0000000000000012.5141
23141197564SEPSISPULM000000000000000-1.0041
34141203131RESPARREST0000000010011010.5641
45141208563ODSEDHYP000000000000000-1.0071
56141227463SEPSISPULM0000000010011001.9061
67141229564CHF000000001001100-1.0041
78141233564S-VALVMI000000001001110-1.0041
89141244564S-FEMPGRAF0000000000000000.6541
910141260564ASTHMA0000000000000001.0441

Last rows

df_indexpatientunitstayidverbalmotoreyesadmitdiagnosisthrombolyticsaidshepaticfailurelymphomametastaticcancerleukemiaimmunosuppressioncirrhosisactivetximamidurventday1oobventday1oobintubday1diabetescreatininedischargelocationvisitnumber
1506121636463353197364S-CABGAOV0000000011001100.7141
1506131636473353198142COMA0000000010011101.0144
1506141636483353200564HYPOVOLEM0000000010011100.9245
1506151636493353201563PLEUREFFUS000000001001110-1.0043
1506161636503353213-1-1-1COMA0000000010011100.6871
1506171636513353216151S-CYSTOTH0000000010011100.7371
1506181636523353226-1-1-1PLEUREFFUS000000001001111-1.0091
1506191636533353235564CHF000000000000000-1.0081
1506201636553353251111CARDARREST0000000010011112.4381
1506211636563353254564LOWGIBLEED000000001000000-1.0041